2 research outputs found
DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention
With the rise in manipulated media, deepfake detection has become an
imperative task for preserving the authenticity of digital content. In this
paper, we present a novel multi-modal audio-video framework designed to
concurrently process audio and video inputs for deepfake detection tasks. Our
model capitalizes on lip synchronization with input audio through a
cross-attention mechanism while extracting visual cues via a fine-tuned VGG-16
network. Subsequently, a transformer encoder network is employed to perform
facial self-attention. We conduct multiple ablation studies highlighting
different strengths of our approach. Our multi-modal methodology outperforms
state-of-the-art multi-modal deepfake detection techniques in terms of F-1 and
per-video AUC scores
Automatic Numerical Methods for Enhancement of Blurred Text-Images via Optimization and Nonlinear Diffusion
In this paper, we propose an automatic numerical method for solving a nonlinear partialdifferential- equation (PDE) based image-processing model. The Perona-Malik diffusion equation (PME) accounts for both forward and backward diffusion regimes so as to perform simultaneous denoising and deblurring depending on the value of the gradient. One of the limitations of this equation is that a large value of the gradient for backward diffusion can lead to singularity formation or staircasing. Guidotti-Kim-Lambers (GKL) came up with a bound for backward diffusion to prevent staircasing, where the backward diffusion is only limited to a specific range beyond which backward diffusion is stopped and forward diffusion begins. Our model combines the PME model and GKL model for automatic sharpening of blurred text-images using Nelder-Mead optimization, a derivative free optimization method that uses n+1 test points arranged as a simplex for n-dimensional optimization. We solve our model by discretizing the PDE in space using finite difference approximation scheme. Then, we enhance the image in each iteration using Backward Euler time-stepping and Minimum Residual Method (MINRES) in MATLAB. Likewise, we propose a gradientbased sharpness metric for our text-images, which also serves as an objective function for our Nelder-Mead optimizer. Our result shows that our proposed model is accurate in enhancing text images and predicting the unknown value of the blurring kernel for automatic sharpening. Numerical results show that the proposed objective sharpness measure coincide with the subjective sharpness of the enhanced image